Estimation of metagenome size and structure in an experimental soil microbiota from low coverage next-generation sequence data.
نویسندگان
چکیده
AIMS A major challenge in metagenome studies is to estimate the true size of all combined genomes. Here, we present a novel approach to estimate the size of all combined genomes for low coverage next-generation sequencing (NGS) data through empirically determined copy numbers of random DNA fragments. METHODS AND RESULTS Size estimates were made based on analyses of two experimental soil micro-ecosystems - simulating soil with and without earthworms. Our analyses showed combined genome sizes of about log 11 nucleotides for each of the soil micro-ecosystems, as estimated from qPCR determined copy numbers of random DNA fragments. This corresponds to more than 20000 unique bacterial genomes in each sample. There seemed, however, to be a bacterial subpopulation in the earthworm soil, not being present in the nonearthworm soil. To describe the structure of the metagenomes, both total DNA and amplified 16S rRNA gene sequence libraries were generated with 454-sequencing. Bioinformatic analysis of 454 sequence libraries showed a large functional but low taxonomic overlap between the samples with and without earthworms. A neutrality test indicated that rare species have a competitive advantage over abundant species in both micro-ecosystems providing a potential explanation for the large metagenome sizes. CONCLUSIONS We have shown that the soil metagenome is very large and that the large size is probably a consequence of top-down selection of the dominant bacterial species. SIGNIFICANCE AND IMPACT OF THE STUDY Estimates of metagenome size from low coverage NGS data will be important for guiding future NGS set-ups.
منابع مشابه
Metagenomic sequence of saline desert microbiota from wild ass sanctuary, Little Rann of Kutch, Gujarat, India
We report Metagenome from the saline desert soil sample of Little Rann of Kutch, Gujarat State, India. Metagenome consisted of 633,760 sequences with size 141,307,202 bp and 56% G + C content. Metagenome sequence data are available at EBI under EBI Metagenomics database with accession no. ERP005612. Community metagenomics revealed total 1802 species belonged to 43 different phyla with dominatin...
متن کاملFiltration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples
Ever-increasing affordability of next-generation sequencing makes whole-metagenome sequencing an attractive alternative to traditional 16S rDNA, RFLP, or culturing approaches for the analysis of microbiome samples. The advantage of whole-metagenome sequencing is that it allows direct inference of the metabolic capacity and physiological features of the studied metagenome without reliance on the...
متن کاملNext Generation Sequencing and its Application in the Study of Microbiome in Plant Diseases Suppressive Soils
Progress in next-generation sequencing has played a significant role in ecological studies of microbial populations. These advances have led to a rapid evaluation in metagenomics studies (analysis of DNA of microbial communities without the need to culture). Many statistical and computational tools and metagenomics databases have led to the discovery of huge amounts of data. In this research, i...
متن کاملTropical Soil Metagenome Library Reveals Complex Microbial Assemblage
2 In this work, we characterized the metagenome of a Malaysian mangrove soil sample via next 3 generation sequencing (NGS). Shotgun NGS data analysis revealed high diversity of microbes 4 from Bacteria and Archaea domains. The metabolic potential of the metagenome was 5 reconstructed using the NGS data and the SEED classification in MEGAN shows abundance of 6 virulence factor genes, implying th...
متن کاملEstimating optimal window size for analysis of low-coverage next-generation sequence data
MOTIVATION Current high-throughput sequencing has greatly transformed genome sequence analysis. In the context of very low-coverage sequencing (<0.1×), performing 'binning' or 'windowing' on mapped short sequences ('reads') is critical to extract genomic information of interest for further evaluation, such as copy-number alteration analysis. If the window size is too small, many windows will ex...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of applied microbiology
دوره 114 1 شماره
صفحات -
تاریخ انتشار 2013